library(igraph)
library(tidyverse)
load("got1.Rda") # /home/public/network/got1.Rdaigraph tutorial
Here is a quick presentation of the igraph package. Some functions of the package are exemplified on toy networks.
Throughout this tutorial, you will be reproducing some manipulations with the got1 network.
The GOT (got1) dataset contains the Game of Thrones characters and their interactions in the first season.
There are five interaction types. Character A and Character B are connected whenever:
- Character A speaks directly after Character B
- Character A speaks about Character B
- Character C speaks about Character A and Character B
- Character A and Character B are mentioned in the same stage direction
- Character A and Character B appear in a scene together
Start
load the library igraph
My first graph
We can build a graph from a list of edges
g1 <- graph( edges=c("A", "B",
"B", "C",
"C", "A"), directed=F )
plot(g1)class(g1)[1] "igraph"
g1IGRAPH 22f5385 UN-- 3 3 --
+ attr: name (v/c)
+ edges from 22f5385 (vertex names):
[1] A--B B--C A--C
g2 <- graph( edges=c("A", "B",
"B", "C",
"C", "A"), directed=T )
g2IGRAPH fae42fb DN-- 3 3 --
+ attr: name (v/c)
+ edges from fae42fb (vertex names):
[1] A->B B->C C->A
plot(g2)g3 <- graph( c("John", "Jim", "Jim", "Jack", "Jim", "Jack", "John", "John"),
isolates=c("Jesse", "Janis", "Jennifer", "Justin") )
plot(g3)Edges, Vertices, etc…
ecount(g3)[1] 4
E(g3)+ 4/4 edges from 21adb3a (vertex names):
[1] John->Jim Jim ->Jack Jim ->Jack John->John
class(E(g3))[1] "igraph.es"
vcount(g3)[1] 7
V(g3)+ 7/7 vertices, named, from 21adb3a:
[1] John Jim Jack Jesse Janis Jennifer Justin
class(V(g3))[1] "igraph.vs"
Display the nodes and the edges of got1. How many nodes and edges are there?
Attributes
Add attributes to nodes
V(g3)$name # automatically generated when we created the network.[1] "John" "Jim" "Jack" "Jesse" "Janis" "Jennifer" "Justin"
V(g3)$gender <- c("male", "male", "male", "male", "female", "female", "male")
# also works with
g3 <- set_vertex_attr(graph = g3, name = "new_attribute", value = 1:7)Get attributes
vertex_attr(g3)$name
[1] "John" "Jim" "Jack" "Jesse" "Janis" "Jennifer" "Justin"
$gender
[1] "male" "male" "male" "male" "female" "female" "male"
$new_attribute
[1] 1 2 3 4 5 6 7
Edges attributes
E(g3)$type <- "email" # Edge attribute, assign "email" to all edges
E(g3)$weight <- 10 # Edge weight, setting all existing edges to 10
edge_attr(g3)$type
[1] "email" "email" "email" "email"
$weight
[1] 10 10 10 10
Which attributes have the nodes ? Which attributes have the edges ?
Import from data.frame
Let’s build a data.frame with 2 columns (from and to)
set.seed(145) # fix random
data.set <- data.frame(from = sample(LETTERS[1:10], size = 20, replace = TRUE),
to = sample(LETTERS[1:10], size = 20, replace = TRUE))g <- igraph::graph_from_data_frame(data.set, directed = FALSE)
gIGRAPH 80511dd UN-- 10 20 --
+ attr: name (v/c)
+ edges from 80511dd (vertex names):
[1] B--H F--G B--D A--I B--F E--I E--E I--J A--C F--I E--G B--F E--I H--J F--E
[16] D--E B--J A--H B--G A--I
plot(g)from an adjacency matrix
mat <- matrix(sample(0:1, size = 100, replace = TRUE), ncol = 10)
colnames(mat) <- LETTERS[1:10]
rownames(mat) <- LETTERS[1:10]
mat A B C D E F G H I J
A 1 1 0 1 1 1 0 1 0 1
B 0 0 0 1 0 0 1 1 0 0
C 0 0 0 0 1 0 1 0 0 0
D 0 1 0 0 1 1 0 0 0 1
E 1 0 0 0 1 1 0 0 1 1
F 0 0 0 1 1 0 1 1 1 0
G 0 1 0 1 1 1 1 1 1 1
H 1 1 1 1 1 0 1 0 1 1
I 0 0 0 0 0 0 1 0 0 1
J 1 1 1 0 1 0 0 0 0 0
g <- igraph::graph_from_adjacency_matrix(mat)
gIGRAPH ecb7b31 DN-- 10 48 --
+ attr: name (v/c)
+ edges from ecb7b31 (vertex names):
[1] A->A A->B A->D A->E A->F A->H A->J B->D B->G B->H C->E C->G D->B D->E D->F
[16] D->J E->A E->E E->F E->I E->J F->D F->E F->G F->H F->I G->B G->D G->E G->F
[31] G->G G->H G->I G->J H->A H->B H->C H->D H->E H->G H->I H->J I->G I->J J->A
[46] J->B J->C J->E
plot(g)Plot
plot(...)Plotting with igraph: the network plots have a wide set of parameters you can set. Those include node options (starting with vertex.) and edge options (starting with edge.). A list of selected options is included below, but you can also check out ?igraph.plotting for more information.
| NODES | |
| vertex.color | Node color |
| vertex.frame.color | Node border color |
| vertex.shape | One of “none”, “circle”, “square”, “csquare”, “rectangle” “crectangle”, “vrectangle”, “pie”, “raster”, or “sphere” |
| vertex.size | Size of the node (default is 15) |
| vertex.size2 | The second size of the node (e.g. for a rectangle) |
| vertex.label | Character vector used to label the nodes |
| vertex.label.family | Font family of the label (e.g.”Times”, “Helvetica”) |
| vertex.label.font | Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol |
| vertex.label.cex | Font size (multiplication factor, device-dependent) |
| vertex.label.dist | Distance between the label and the vertex |
| vertex.label.degree | The position of the label in relation to the vertex, where 0 right, “pi” is left, “pi/2” is below, and “-pi/2” is above |
| EDGES | |
| edge.color | Edge color |
| edge.width | Edge width, defaults to 1 |
| edge.arrow.size | Arrow size, defaults to 1 |
| edge.arrow.width | Arrow width, defaults to 1 |
| edge.lty | Line type, could be 0 or “blank”, 1 or “solid”, 2 or “dashed”, 3 or “dotted”, 4 or “dotdash”, 5 or “longdash”, 6 or “twodash” |
| edge.label | Character vector used to label edges |
| edge.label.family | Font family of the label (e.g.”Times”, “Helvetica”) |
| edge.label.font | Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol |
| edge.label.cex | Font size for edge labels |
| edge.curved | Edge curvature, range 0-1 (FALSE sets it to 0, TRUE to 0.5) |
| arrow.mode | Vector specifying whether edges should have arrows, possible values: 0 no arrow, 1 back, 2 forward, 3 both |
| OTHER | |
| margin | Empty space margins around the plot, vector with length 4 |
| frame | if TRUE, the plot will be framed |
| main | If set, adds a title to the plot |
| sub | If set, adds a subtitle to the plot |
The first way to modify the default plot is to include those parameters inside plot(…).
plot(g, edge.arrow.size=.2, edge.curved=0,
vertex.color=c(1,2,1,2,1,2,1,2,1,2),
vertex.frame.color="red",
vertex.label.cex=.7,
edge.color="blue",
main = "Network")Or you can set plotting parameters as graph attributes:
V(g)$size <- 20
V(g)$frame.color <- "white"
V(g)$color <- "blue"
V(g)$label.color <- "white"
vertex_attr(g)$name
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
$size
[1] 20 20 20 20 20 20 20 20 20 20
$frame.color
[1] "white" "white" "white" "white" "white" "white" "white" "white" "white"
[10] "white"
$color
[1] "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue"
$label.color
[1] "white" "white" "white" "white" "white" "white" "white" "white" "white"
[10] "white"
E(g)$color <- "red"
E(g)$lty <- 4
plot(g)Personal favourite
Graph attributes can be casted from list to data.frame.
new_vertex_attr <- vertex_attr(g) %>% as.data.frame() %>%
mutate(color = ifelse(name == "A", "green", "pink"))
new_vertex_attr vertex_attr(g) <- new_vertex_attr # %>% as.list()
plot(g)Plot the graph got1.
Modify the vertex attribute to add the sex (/home/public/network/got_sex.csv).
Change the node color based on the sex.
Network layouts
Network layouts are simply algorithms that return coordinates for each node in a network.
Let’s generate a random graph
net.bg <- sample_pa(80)
V(net.bg)$size <- 8
V(net.bg)$frame.color <- "white"
V(net.bg)$color <- "orange"
V(net.bg)$label <- ""
E(net.bg)$arrow.mode <- 0
plot(net.bg)Then use any layout_* in the layout parameter which tells the position of each node.
plot(net.bg, layout=layout_randomly)However, each time you plot the graph with layout, new coordinates are given.
par(mfrow=c(2, 2), mar=c(1,1,1,1))
plot(net.bg, layout=layout_randomly, main = "1")
plot(net.bg, layout=layout_randomly, main = "2")
plot(net.bg, layout=layout_randomly, main = "3")
plot(net.bg, layout=layout_randomly, main = "4")Alternatively, you can set it in advance
par(mfrow=c(2, 2), mar=c(1,1,1,1))
l <- layout_randomly(net.bg)
plot(net.bg, layout=l, main = "1")
plot(net.bg, layout=l, main = "2")
plot(net.bg, layout=l, main = "3")
plot(net.bg, layout=l, main = "4")Some other layouts
par(mfrow=c(2, 2), mar=c(1,1,1,1))
plot(net.bg, layout=layout_in_circle)
plot(net.bg, layout=layout_on_sphere)
plot(net.bg, layout=layout_on_grid)
plot(net.bg, layout=layout_with_fr)Plot the got1 network and test different layout.
Graph Topology
Centrality
Degree
V(g)+ 10/10 vertices, named, from ecb7b31:
[1] A B C D E F G H I J
degree(g) A B C D E F G H I J
11 8 4 9 13 9 14 12 6 10
hist(degree(net.bg))# compare to power law
po <- function(x,k,a){
a*x^-k
}
plot(x = 1:20, y = po(x = 1:20,k = 3, a = 2), type = "l")Plot the degree distribution. What is the highest degree? Which character has the highest degree?
Betweenness
be <- betweenness(net.bg)
be [1] 0 41 6 4 15 0 20 22 18 3 0 0 0 0 0 4 6 2 0 6 0 4 0 0 0
[26] 0 0 0 1 2 0 1 0 0 2 0 3 0 0 1 0 0 0 0 2 0 0 2 0 3
[51] 3 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[76] 0 0 0 0 0
Color the node based on the betweenness score.
Easy color gradient
# from red to green
my_palette <- grDevices::colorRampPalette(c("green","red"))
# set the number of color
nb_color <- 5
# a quick plot
plot(rep(1,nb_color),col=my_palette(5), pch=19,cex=3)# with cut, similar value have similar color
my_df <- data.frame(value = be,
color = cut(be, nb_color, labels = my_palette(nb_color)) %>%
as.character) # warning: cut create factor
head(my_df)plot(net.bg, vertex.color = my_df$color)Plot the distribution of the betweenness centrality. What is the highest degree? Which character has the highest degree?
Plot the network. With a color gradient, color each node to represent its betweenness score. ,
Pathology
library(networkdata)
data("starwars", package = "networkdata")
# Scene Co-occurrence of Star Wars Characters (Episode 1-7)
st4 <- starwars[[4]] # Episode 4
st4IGRAPH 99191b2 UNW- 21 60 -- Episode IV – A New Hope
+ attr: name (g/c), name (v/c), height (v/n), mass (v/n), hair_color
| (v/c), skin_color (v/c), eye_color (v/c), birth_year (v/n), sex
| (v/c), homeworld (v/c), species (v/c), weight (e/n)
+ edges from 99191b2 (vertex names):
[1] R2-D2 --CHEWBACCA R2-D2 --C-3PO R2-D2 --BERU
[4] R2-D2 --LUKE R2-D2 --OWEN R2-D2 --OBI-WAN
[7] R2-D2 --LEIA R2-D2 --BIGGS R2-D2 --HAN
[10] CHEWBACCA --OBI-WAN CHEWBACCA --C-3PO CHEWBACCA --LUKE
[13] CHEWBACCA --HAN CHEWBACCA --LEIA LUKE --CAMIE
[16] CAMIE --BIGGS LUKE --BIGGS DARTH VADER--LEIA
+ ... omitted several edges
as_adjacency_matrix(st4, attr = "weight")21 x 21 sparse Matrix of class "dgCMatrix"
R2-D2 . 3 17 14 . . 1 5 1 1 4 . . 6 . . . . . . .
CHEWBACCA 3 . 4 14 . . . 8 . . 4 . . 19 . . . . . . .
C-3PO 17 4 . 18 . . 1 6 2 2 6 . . 6 . . . . . 1 .
LUKE 14 14 18 . . 2 4 17 3 3 19 . . 26 . . 1 1 2 3 1
DARTH VADER . . . . . . . 1 . . 1 1 7 . . . . . . . .
CAMIE . . . 2 . . 2 . . . . . . . . . . . . . .
BIGGS 1 . 1 4 . 2 . 1 . . . . . . . . . 1 2 3 .
LEIA 5 8 6 17 1 . 1 . 1 . 1 1 1 13 . . . . . 1 .
BERU 1 . 2 3 . . . 1 . 3 . . . . . . . . . . .
OWEN 1 . 2 3 . . . . 3 . . . . . . . . . . . .
OBI-WAN 4 4 6 19 1 . . 1 . . . . . 9 . . . . . . .
MOTTI . . . . 1 . . 1 . . . . 2 . . . . . . . .
TARKIN . . . . 7 . . 1 . . . 2 . . . . . . . . .
HAN 6 19 6 26 . . . 13 . . 9 . . . 1 1 . . . . .
GREEDO . . . . . . . . . . . . . 1 . . . . . . .
JABBA . . . . . . . . . . . . . 1 . . . . . . .
DODONNA . . . 1 . . . . . . . . . . . . . 1 1 . .
GOLD LEADER . . . 1 . . 1 . . . . . . . . . 1 . 1 1 .
WEDGE . . . 2 . . 2 . . . . . . . . . 1 1 . 3 .
RED LEADER . . 1 3 . . 3 1 . . . . . . . . . 1 3 . 1
RED TEN . . . 1 . . . . . . . . . . . . . . . 1 .
par(mar=c(1,1,1,1))
plot(st4, layout=layout_nicely, vertex.label.cex = 0.5, vertex.shape="none",
edge.color = "grey")shortest path
sp <- shortest_paths(graph = st4,
from = V(st4)[name == "LUKE"],
to =V(st4)[name == "DARTH VADER"],output = "both",
weights=rep(1,ecount(st4))) # or NA
sp $vpath
$vpath[[1]]
+ 3/21 vertices, named, from 99191b2:
[1] LUKE LEIA DARTH VADER
$epath
$epath[[1]]
+ 2/60 edges from 99191b2 (vertex names):
[1] LUKE --LEIA DARTH VADER--LEIA
$predecessors
NULL
$inbound_edges
NULL
# plot
## highlight path
ecol <- rep("gray80", ecount(st4))
ecol[unlist(sp$epath)] <- "orange"
## highlight nodes
esize <- rep(2, ecount(st4))
esize[unlist(sp$epath)] <- 4
# Generate node color variable to plot the path:
vcol <- rep("gray80", vcount(st4))
vcol[unlist(sp$vpath)] <- "gold"
vsize <- rep(1, vcount(st4))
vsize[unlist(sp$vpath)] <- 10
par(mar=c(1,1,1,1))
plot(st4, vertex.color=vcol, edge.color=ecol,
edge.width=esize, vertex.size = vsize,
vertex.label.cex = 0.5)all distances
distances(st4)diameter = longest distance
diameter(st4)[1] 10
What is the diameter of the graph? How long is the shortest path from Jon to Daenerys? What is the path that links Jon to Daenerys?
Ego
eg <- make_ego_graph(st4, nodes = "DARTH VADER", order = 1)
plot(eg[[1]])What is the degree of Jon ? Plot the ego network of degree 1 of Jon.
Modularity
Cliques
Cliques = complete subgraphs of an undirected graph.
cl <- cliques(st4, min = 4)
head(cl)[[1]]
+ 4/21 vertices, named, from 99191b2:
[1] R2-D2 C-3PO LUKE OWEN
[[2]]
+ 4/21 vertices, named, from 99191b2:
[1] R2-D2 C-3PO LUKE LEIA
[[3]]
+ 4/21 vertices, named, from 99191b2:
[1] LUKE GOLD LEADER WEDGE RED LEADER
[[4]]
+ 4/21 vertices, named, from 99191b2:
[1] C-3PO LUKE LEIA RED LEADER
[[5]]
+ 4/21 vertices, named, from 99191b2:
[1] LUKE DODONNA GOLD LEADER WEDGE
[[6]]
+ 4/21 vertices, named, from 99191b2:
[1] DARTH VADER LEIA MOTTI TARKIN
cl <- largest_cliques(st4)
vcol <- rep("grey80", vcount(st4))
vcol[unlist(largest_cliques(st4))] <- "gold"
par(mar=c(1,1,1,1))
plot(st4, vertex.color = vcol, vertex.size = 4)induced_subgraph(graph = st4, vids = unlist(cl)) %>%
plot(main = "Largest clique (subgraph)", layout = layout_in_circle)What is the largest clique ? Plot it.
Community detection
cluster <- cluster_walktrap(st4) # louvain, edge_betweenness, ...
clusterIGRAPH clustering walktrap, groups: 3, mod: 0.15
+ groups:
$`1`
[1] "R2-D2" "CHEWBACCA" "C-3PO" "LUKE" "LEIA" "BERU"
[7] "OWEN" "OBI-WAN" "HAN" "GREEDO" "JABBA"
$`2`
[1] "CAMIE" "BIGGS" "DODONNA" "GOLD LEADER" "WEDGE"
[6] "RED LEADER" "RED TEN"
$`3`
[1] "DARTH VADER" "MOTTI" "TARKIN"
+ ... omitted several groups/vertices
class(cluster)[1] "communities"
dendPlot(cluster, mode="hclust")par(mar=c(1,1,1,1))
plot(cluster, st4, layout=layout_nicely, vertex.label.cex = 0.5,
vertex.shape="none")From your graph, identify modules. How many modules are there? How many characters does the largest module contain?
Display the distance dendrogram. Display the module in the graph.